Modern statistical learning algorithms are capable of amazing flexibility, but struggle with interpretability. One possible solution is sparsity: making inference such that many of the parameters are estimated as being identically 0, which may be imposed through the use of nonsmooth penalties such as the $\ell_1$ penalty. However, the $\ell_1$ penalty introduces significant bias when high sparsity is desired. In this article, we retain the $\ell_1$ penalty, but define learnable penalty weights $\lambda_p$ endowed with hyperpriors. We start the article by investigating the optimization problem this poses, developing a proximal operator associated with the $\ell_1$ norm. We then study the theoretical properties of this variable-coefficient $\ell_1$ penalty in the context of penalized likelihood. Next, we investigate application of this penalty to Variational Bayes, developing a model we call the Sparse Bayesian Lasso which allows for behavior qualitatively like Lasso regression to be applied to arbitrary variational models. In simulation studies, this gives us the Uncertainty Quantification and low bias properties of simulation-based approaches with an order of magnitude less computation. Finally, we apply our methodology to a Bayesian lagged spatiotemporal regression model of internal displacement that occurred during the Iraqi Civil War of 2013-2017.
translated by 谷歌翻译
贝叶斯优化是一种顺序设计形式:使用适当灵活的非线性回归模型理想化输入 - 输出关系;符合初始实验活动的数据;设计并优化用于选择拟合模型(例如,通过预测方程)下的下一个实验条件的标准,以实现兴趣的结果(例如最小值);在这些条件下获取输出并更新拟合后重复。在许多情况下,这种在新数据采集标准上的“内部优化”是麻烦的,因为它是非凸/高度多模态,可能是非可分子的,或者可能可能挫败数值优化器,尤其是当推理需要蒙特卡罗时。在这种情况下,在随机候选中,用离散的一个离散的一个不常见的情况并不罕见。在这里,我们提出了基于现有输入设计的Delaunay三角测量的候选者。除了详细构建这些“Tricands”之外,基于传统凸船库围绕的简单包装,我们基于所涉及的几何标准的性质促进了几个优势。然后,我们证明了与数值优化的采集和基于随机候选的替代品相比,特异性如何导致Tricands如何导致更好的贝叶斯优化性能。
translated by 谷歌翻译
在不断努力提高产品质量和降低运营成本中,越来越多地部署计算建模以确定产品设计或配置的可行性。通过本地模型代理这些计算机实验的建模,仅考虑短程交互,诱导稀疏性,可以解决复杂输入输出关系的巨大分析。然而,缩小到地方规模的重点意味着必须一遍又一遍地重新学习全球趋势。在本文中,我们提出了一种框架,用于将来自全局敏感性分析的信息纳入代理模型作为输入旋转和重新扫描预处理步骤。我们讨论了基于内核回归的几个敏感性分析方法的关系在描述它们如何产生输入变量的转换之前。具体而言,我们执行输入扭曲,使得“翘曲模拟器”对所有输入方向同样敏感,释放本地模型以专注于本地动态。观测数据和基准测试功能的数值实验,包括来自汽车行业的高维计算机模拟器,提供了实证验证。
translated by 谷歌翻译
Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning (RL), but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality and outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem.
translated by 谷歌翻译
We demonstrate how efficient autonomous drone swarms can be in detecting and tracking occluded targets in densely forested areas, such as lost people during search and rescue missions. Exploration and optimization of local viewing conditions, such as occlusion density and target view obliqueness, provide much faster and much more reliable results than previous, blind sampling strategies that are based on pre-defined waypoints. An adapted real-time particle swarm optimization and a new objective function are presented that are able to deal with dynamic and highly random through-foliage conditions. Synthetic aperture sensing is our fundamental sampling principle, and drone swarms are employed to approximate the optical signals of extremely wide and adaptable airborne lenses.
translated by 谷歌翻译
Sequential testing, always-valid $p$-values, and confidence sequences promise flexible statistical inference and on-the-fly decision making. However, unlike fixed-$n$ inference based on asymptotic normality, existing sequential tests either make parametric assumptions and end up under-covering/over-rejecting when these fail or use non-parametric but conservative concentration inequalities and end up over-covering/under-rejecting. To circumvent these issues, we sidestep exact at-least-$\alpha$ coverage and focus on asymptotically exact coverage and asymptotic optimality. That is, we seek sequential tests whose probability of ever rejecting a true hypothesis asymptotically approaches $\alpha$ and whose expected time to reject a false hypothesis approaches a lower bound on all tests with asymptotic coverage at least $\alpha$, both under an appropriate asymptotic regime. We permit observations to be both non-parametric and dependent and focus on testing whether the observations form a martingale difference sequence. We propose the universal sequential probability ratio test (uSPRT), a slight modification to the normal-mixture sequential probability ratio test, where we add a burn-in period and adjust thresholds accordingly. We show that even in this very general setting, the uSPRT is asymptotically optimal under mild generic conditions. We apply the results to stabilized estimating equations to test means, treatment effects, etc. Our results also provide corresponding guarantees for the implied confidence sequences. Numerical simulations verify our guarantees and the benefits of the uSPRT over alternatives.
translated by 谷歌翻译
Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but the quality bar for medical and clinical applications is high. Today, attempts to assess models' clinical knowledge typically rely on automated evaluations on limited benchmarks. There is no standard to evaluate model predictions and reasoning across a breadth of tasks. To address this, we present MultiMedQA, a benchmark combining six existing open question answering datasets spanning professional medical exams, research, and consumer queries; and HealthSearchQA, a new free-response dataset of medical questions searched online. We propose a framework for human evaluation of model answers along multiple axes including factuality, precision, possible harm, and bias. In addition, we evaluate PaLM (a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM, on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA, MMLU clinical topics), including 67.6% accuracy on MedQA (US Medical License Exam questions), surpassing prior state-of-the-art by over 17%. However, human evaluation reveals key gaps in Flan-PaLM responses. To resolve this we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal important limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLM models for clinical applications.
translated by 谷歌翻译
Transformers have been essential to pretraining success in NLP. Other architectures have been used, but require attention layers to match benchmark accuracy. This work explores pretraining without attention. We test recently developed routing layers based on state-space models (SSM) and model architectures based on multiplicative gating. Used together these modeling choices have a large impact on pretraining accuracy. Empirically the proposed Bidirectional Gated SSM (BiGS) replicates BERT pretraining results without attention and can be extended to long-form pretraining of 4096 tokens without approximation.
translated by 谷歌翻译
In this paper, we present strong baselines for the task of Feedback Comment Generation for Writing Learning. Given a sentence and an error span, the task is to generate a feedback comment explaining the error. Sentences and feedback comments are both in English. We experiment with LLMs and also create multiple pseudo datasets for the task, investigating how it affects the performance of our system. We present our results for the task along with extensive analysis of the generated comments with the aim of aiding future studies in feedback comment generation for English language learners.
translated by 谷歌翻译
In order for automated mobile vehicles to navigate in the real world with minimal collision risks, it is necessary for their planning algorithms to consider uncertainties from measurements and environmental disturbances. In this paper, we consider analytical solutions for a conservative approximation of the mutual probability of collision between two robotic vehicles in the presence of such uncertainties. Therein, we present two methods, which we call unitary scaling and principal axes rotation, for decoupling the bivariate integral required for efficient approximation of the probability of collision between two vehicles including orientation effects. We compare the conservatism of these methods analytically and numerically. By closing a control loop through a model predictive guidance scheme, we observe through Monte-Carlo simulations that directly implementing collision avoidance constraints from the conservative approximations remains infeasible for real-time planning. We then propose and implement a convexification approach based on the tightened collision constraints that significantly improves the computational efficiency and robustness of the predictive guidance scheme.
translated by 谷歌翻译